PRORFP.TEX[DMA,HE] - www.SailDart.org

perm filename PRORFP.TEX[DMA,HE] blob sn#672000 filedate 1982-08-10 generic text, type C, neo UTF8

COMMENT ⊗ VALID 00019 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 \makchap{\introduction}
C00007 00003 \setchaptitle{Introduction}
C00010 00004 \setchaptitle{Research Direction}
C00017 00005 \setchaptitle{Task Statement}
C00034 00006 \setchaptitle{Compilation Goals}
C00050 00007 \setchaptitle{Technical Approach}
C00072 00008 \headsection{Sensing and Knowledge-Base Sources of Data}
C00077 00009 \headsection{Flexibility in Imaging Formats}
C00085 00010 \headsection{Issues for the Analysis of Cultural Scenes}
C00100 00011 \sheadsection{Monocular Interpretation of General Surfaces}
C00113 00012 \sheadsection{Geometric Constraints for Special Surfaces}
C00122 00013 \sheadsection{Geometric Constraints for General Surfaces}
C00137 00014 \sheadsection{Extended Surface Interpretations}
C00140 00015 \headsection{The Evaluation Function and Search Procedure}
C00157 00016 \sheadsection{Considerations for a Search Procedure}
C00173 00017 \setchapter{\timetable}
C00181 00018 \setchapter{\deliverables}
C00199 00019 \setchapter{\budget}
C00213 ENDMK
C⊗;
\makchap{\introduction}
\makchap{\researchdirection}
\makchap{\taskstatement}
\makchap{\compgoals}
\makchap{\approach}
\makchap{\timetable}
\makchap{\deliverables}
\makchap{\budget}

\def\textpart{F} % preliminary is roman numeral numbering
\gdef\chapstrt{F}
\gdef\pagstrt{T} % pages start numbering now
\setcount0 0 % start page count at -1

\setchaptitle{Abstract}
\headcentre{ABSTRACT}

The program includes three elements:
\chart{1)}{analysis and implementation of advanced stereo correspondence algorithms; }
\chart{2)}{design and implementation of a powerful and general modularized
stereo mapping system;}
\chart{3)}{development of functional specifications of
an advanced stereo mapping system,
relevant to the interests of the Air Force and the Defense Mapping Agency.}

The proposed work builds upon two recently completed projects sponsored in
our laboratory by Rome Air Development Center: a) {\it ``Survey of Stereo
Mapping Systems and Related and Supporting Techniques and Literature''} and
b) {\it ``Preliminary Design of an Automated Stereo Compilation System''}.

It is expected that this work will be followed by a development effort to
provide an advanced stereo compilation facility. The development
facility will integrate manual and automatic stereo mapping techniques to
enhance interactive mapping productivity.

The proposal requests support for 24 months: $\$311296.00$ for
year 1 from October 1, 1982 to September 30 1983, $\$341751.00$
for year 2 from October 1, 1983 to September 30, 1984.

\newpage
\ignore{Table of Contents goes here}
\setchaptitle{Introduction}
\setchapter{\introduction}
\chapheading{INTRODUCTION}

\def\textpart{T}
\def\notchap{F} % say Chapter
\setcount0 0 % page counters
\gdef\chapstrt{T}

The program includes three elements:
\chart{1)}{analysis and implementation of advanced stereo correspondence algorithms; }
\chart{2)}{design and implementation of a powerful and general modularized
stereo mapping system;}
\chart{3)}{development of functional specifications of
an advanced stereo mapping system with significant automation,
relevant to the interests of the Air Force and the Defense Mapping Agency.}

The proposal requests support for 24 months: $\$311296.00$ for
year 1 from October 1, 1982 to September 30 1983, $\$341751.00$
for year 2 from October 1, 1983 to September 30, 1984.

\newpage
\setchaptitle{Research Direction}
\setchapter{\researchdirection}
\chapheading{RESEARCH DIRECTION}

The Stanford Artificial Intelligence Laboratory proposes to the
Rome Air Development Center
to carry out a program of research
to advance the level of technology in automated stereo compilation of
terrain and complex cultural scenes. The product of the proposed research
will be a stereo mapping system demonstrating increased accuracy, reliability,
level of detail, and level of automation over present
mapping techniques and their extensions.

The emphasis of the research will be on demonstrating
capability of automating significant compilation tasks.
Consideration will be given to speed of mapping, but speed will be a secondary
goal. It is believed that minor tuning will enable techniques developed to be
made sufficiently fast to be practical for compilation tasks.

\headsection{Elements of the Research}

The research program we propose includes three elements:
\chart{1)}{analysis, implementation, and evaluation of advanced stereo correspondence
algorithms; }
\chart{2)}{design and implementation of a powerful and general modularized stereo
mapping system; and}
\chart{3)}{development of a functional specification of an
advanced stereo mapping system with significant automation.}

That is, we will study new and existing correspondence techniques, integrate these
into an advanced mapping system, and document the results of this research for
transfer to follow-on work.

\headsection{System Capabilities}

The proposed research will advance the technology of automated stereo compilation
through demonstration of the following capabilities:
\chart{1)}{accurate segmentation of surface boundaries and measurement of surface
dimensions for discontinuous terrain and cultural features;}
\chart{2)}{high resolution in description of small features, \eg wires and poles;}
\chart{3)}{effective mapping in areas with repetitive texture or featureless texture;}
\chart{4)}{accurate description of curved surfaces;}
\chart{5)}{description of vertical surfaces - compilation over occluded and steeply
inclined regions;}
\chart{6)}{accurate three-dimensional fill-in of image detail appearing
monoscopically in a stereo pair;}
\chart{7)}{symbolic surface description;}
\chart{8)}{provision of semi- to fully-automated compilation of digital imagery
to reduce present manhour requirements in mapping.}
\chart{9)}{acceptable computation time for the automated part of the stereo
compilation;}

\headsection{Output Products}

Output from the automated stereo mapping process will include:

\vspssml
\chart{1)}{digital terrain elevation data - a table of elevation
at earth coordinates, and an interpolation function; this in a format amenable to
DTED production;}
\chart{2)}{surface shell --- a surface and edge-oriented surface map of heights;}
\chart{3)}{symbolic surface shell ---
symbolic surface information corresponding to cultural construction
elements, planes and cylindrical surfaces, horizontal and vertical.
Reliable production of DFAD symbolic classifications is not now within the
state of the art of image understanding. The digital feature descriptions
here will be exploitable for interactive compilation of DFAD.}

\headsection{Stereo Research at Stanford}

The Stanford Artificial Intelligence Laboratory is uniquely qualified to
carry out this stereo mapping research. We have over a decade of
research experience in designing and implementing computer-based stereo vision
systems ([\quam], [\nevatiathesis], [\baumgart], [\hannah],
[\ganapathy], [\arnold], [\moravec], [\gennery], [\arnbinf], [\bakerijvanc], [\arnoldthesis]), and
have made, through this work, many vital contributions to the state of the art of
digital stereo mapping. The {\it ``Survey of Stereo Mapping Systems and Related
and Supporting Techniques and Literature''} recently completed by our group for
Rome Air Development Center documents the principal digital mapping systems
and techniques, critically reviews contributions to important aspects of related
image analysis work, and summarizes the state of current mapping technology.
A subsequent report from our laboratory, {\it ``Preliminary Design of an Automated
Stereo Compilation System,''} specifies a preliminary design for the
rule-based mapping system discussed in this proposal.

\newpage
\setchaptitle{Task Statement}
\setchapter{\taskstatement}
\chapheading{TASK STATEMENT}

This section contains a summary of tasks,
followed by a detailed task breakdown:

\headsection{Summary of Tasks}
\chart{1)}{Selection of appropriate test images.}
\chart{2)}{Development of hierarchical rule-based system concepts:}
\chartsml{.1}{Design of advanced stereo correspondence research system;}
\chartsml{.2}{Review of system design;}
\chartsml{.3}{Implementation of stereo correspondence research system;}
\chartsml{.4}{Review of implementation.}
\chart{3)}{Theoretical analysis, development, and implementation of algorithms:}
\chartsml{.1}{Line finding, curve linking, and vertex estimation;}
\chartsml{.2}{Geometric constraints for general surfaces;}
\chartsml{.3}{Geometric constraints for special surfaces;}
\chartsml{.4}{Correspondence evaluation functions and search procedures;}
\chartsml{.5}{Evaluation of technology.}
\chart{4)}{Development of functional specifications for follow-on
development effort:}
\chartsml{.1}{Development of preliminary functional specification;}
\chartsml{.2}{Review of preliminary functional specifications;}
\chartsml{.3}{Development of revised functional specification;}
\chart{5)}{Intermediate and final reports.}

Detailed descriptions of the above tasks follow:

\sheadsection{Selection of Test Images}

Several classes of appropriate test images will be selected in
consultation with RADC and DMA. The images should be of
realistic difficulty, \ie they should reflect typical rather than worst case
circumstances. It is expected that images will include relatively dense
cultural configurations. The images may be acquired with devices of
diverse geometry. Ground truth should be provided where available.
It is expected that image digitization will be performed by the DMA.

Sample Defense Mapping Agency compilation results from DTED and DFAD
analyses at levels I and II will be used to demonstrate mapping goals for
our system. These will provide benchmarks against which we will gauge the
functioning of our compilation system. The production of DTED and DFAD is not
an objective of this research; the product of our research efforts will be
a system whose output is amenable to DTED and DFAD extraction.

\sheadsection{Development of Hierarchical Rule-Based System Concepts}

The starting place for development of rule-based system concepts will
be the ACRONYM system at Stanford. A major exercise for system development
will be design of a research system at Stanford. It is emphasized
that this will be a research system only. Its design and implementation
involve systematization and
incorporation of state of the art algorithms into an advanced, automated
stereo compilation system, modularized to facilitate debugging, design,
interfacing and incorporation of new concepts and algorithms.
A preliminary design has been completed and presented to RADC in the
Stanford report, {\it ``Preliminary Design of an Automated Stereo
Compilation System''}.
This will be followed by a design review
of the research system, initial implementation of the research system and an
implementation review of the research system.
This system building effort will provide an experimental capability for
evaluating results of research in algorithms relevant to stereo vision.
The system building effort is also an excellent way to provide a strong
check on Stanford input into the design of the follow-on system, and
to provide implementation examples for the follow-on development system.
This effort will be simultaneous with
exploratory research focussed on development of algorithms critical for
stereo vision, including:
line finding, curve linking, and vertex estimation;
geometric constraints for general surfaces, including shadows;
geometric constraints for special surfaces, including OTVs.
A quantitative assessment will be made of relevant technologies at
Stanford and elsewhere
at the time of completion of the effort, and will be documented.

\sheadsection{Theoretical Analysis, Development, and Implementation of Algorithms}

The strategy will be to take a broad, goal-driven sampling
of correspondence algorithms, directed at those which will advance system
performance most, weighted by expected effort.
Algorithm development in most areas will be curtailed
in breadth and depth during the proposal period, and some areas will be
neglected entirely.
The Stanford Artificial Intelligence Laboratory plans to augment this program if
possible, and
intends to coordinate support for a critical mass effort in stereo compilation.

\ssheadsection{Line Finding, Curve Linking, and Vertex Estimation}

An intermediate level system will be developed which provides linked edges
of medium high quality. It will surpass performance of the Nevatia-Babu
system [\nevatiababu] in accuracy of determination of edge positions and angles;
it will segment linked edges into curved elements instead of straight line
segments as in Nevatia-Babu;
it will provide linked, curved edges, segmented at cusps, and extrapolated to
vertices. It will have noticeable limitations in sensitivity (missing faint
edges humans see); it will have problems in textured areas (missing boundaries
distinguishable only by texture and missing some intensity boundaries in complex
textured areas).
This research effort will integrate superior algorithms for edge segmentation from
among those currently available. This will principally involve exploiting
recent results from Stanford [\marimontiu]. If further development of
algorithms along these lines is continued by Stanford, it will be
under other support.

\ssheadsection{Geometric Constraints for General Surfaces}

A brief investigation will search for new constraints within epipolar
planes. Extrapolation of surfaces into areas obscured in one view will
be analyzed and implemented. Constraints for extended edges
between epipolar planes will be analyzed and implemented.
Small errors in registration of stereo views which cause non-correspondence
of vertices will be determined and corrected for.
Results of analysis of inference of surfaces from images will be
implemented. An investigation will be made of extensions of results on
inference of surfaces to stereo correspondence. Further development of
use of shadows will depend on other support.

\ssheadsection{Geometric Constraints for Special Surfaces}

Constraints for Orthogonal Trihedral Vertices with horizontal and
vertical edges in nadir-viewing and oblique viewing will be analyzed
and implemented as appropriate. Similar results for horizontal and vertical planes
will be obtained. Any results for cylinders or inclined planes will
depend on other support.

\newpage
\ssheadsection{Correspondence Evaluation Function and Search Procedure}

Geometric constraints will be combined in an evaluation function which
quantifies a choice among possible solutions for stereo correspondence.
A global search procedure will be analyzed and implemented. It
will choose optimal or quasi-optimal solutions
based on the evaluation function.
The evaluation function and search procedure will
accommodate the major effects of realistic non-correspondences caused by errors
of edge-finding, overhangs, motion between images, and photometric
non-correspondence.
Evaluation and search will primarily be at the level of extended edges.

\ssheadsection{Evaluation of Technology}

Quantitative analyses of algorithms will be undertaken from
the standpoint of:
\chart{1)}{geometric accuracy of determination of boundaries
in space, accuracy of angles, and consistency of symbolic assignments;}
\chart{2)}{fundamental conceptual limitations and acid tests, \ie problem examples;}
\chart{3)}{conceptual generalizations and extensions;}
\chart{4)}{intrinsic computational complexity;}
\chart{5)}{intrinsic memory requirements.}

Brief analyses will be made of efficient software and hardware
implementations.

\sheadsection{Development of Functional Specifications}

Stanford will consult with representatives of DMA and the Air Force in
preparing a preliminary functional specification for a follow-on
development effort, and will participate in a design review with
representatives from Air Force and DMA and selected interested parties.
Stanford will prepare a revised functional specification for a follow-on
development effort. Stanford will consult on follow-on design with a
contractor designated by DMA and Air Force and will participate in a
design review of a follow-on development system. Stanford will consult on
implementation including supplying code directly. Within limits of
resources and support of this effort, it is anticipated that throughout
that portion of this project in which Stanford's involvement overlaps that
of a follow-on industrial contractor, there will be maintained close and
continuing communication between Stanford and the follow-on participants.
Technology transfer should be maximized by having both Stanford and any
follow-on contractor use similar computer systems, and possibly common
stereo station hardware, and common software to allow direct transfer of
code.

\sheadsection{Reporting}

Interim reports will be prepared at six month intervals.
A final report will be prepared, documenting Stanford research accomplishments,
evaluating the follow-on system, summarizing the state of the art
at the end of the effort, presenting analyses of performance of
stereo algorithms, presenting projections for further progress
in automating stereo compilation, and recommending a course for
future development and supporting research. A minimum of four oral
presentation will be made during the course of the research. These
presentations shall discuss design reviews and preliminary functional
specifications for the follow-on development effort.

\newpage
\setchaptitle{Compilation Goals}
\setchapter{\compgoals}
\chapheading{COMPILATION GOALS}

Cartography has traditionally concentrated on mapping of terrain, producing
elevation contour maps and digital terrain data bases at coarse resolution.
This manual stereo mapping is labor intensive.
The bulk of stereophotogrammetry is accomplished by humans carrying out
stereo correspondence manually, or assisting interactive semi-automated systems
(UNAMACE, AS-11B-X). Semi-automated systems provide automated correlation of
small areas of a scene, and require extensive operator intervention.

Major new DMA activities involve large volumes of stereo mapping,
including mapping microstructure of cultural sites, contour mapping for
digital data bases with higher resolution and finer elevation contour
intervals, and constructing reference images for autonomous terminal
homing. These activities require roughly an order of magnitude or more
increase in accuracy in stereo mapping, with a resulting increase in
effort. Current methods, largely manual, do not appear capable of
production required for the combined effects of increased volume of input,
higher accuracy, and demand for increased output volume. These activities
provide a strong motivation for development of capabilities for automation
of stereo cartography and augmentation of productivity of human operators.

Two DMA digital products whose production is addressed by this research
are the terrain databases DTED (Digital Terrain Elevation Data)
and the feature databases DFAD (Digital Feature Analysis
Data). DTED is a matrix of elevation points generated at a specific spacing.
DFAD consists of 3-D digitized planimetric and cultural features along with
classification coded descriptions of their physical characteristics. Each item of these
databases is related precisely to a world geodetic coordinate system.
Current semi-automated production of DTED's is available for analysis at
low resolution spacings of 3 arc seconds (about 90 meters). Over a
thousand square nautical mile area compilation of DTED requires about
4 man-weeks of labor and 1.5
machine-weeks (this is DLMS level I).
A DFAD at comparable
resolution is obtained entirely manually, and requires about 9 man-weeks
of labor. Increasing resolution for DTED to about 30 meters on the ground
(tripling it) brings the labor up to about 40 man-weeks and 20
machine-weeks, none of which is automated (DLMS level II). The equivalent
DFAD requires about 6 man-years. Special requirements
for select DTED tasks beyond this resolution lead to processing times on
the order of 4.5 man-years and 1 machine-year per thousand square nautical miles
(DLMS level III).

There is wide agreement that radical improvements are necessary
for substantial increases in productivity. DMA requirements call for
reducing the compilation time of DTED and DFAD levels I and II by as much as
an order of magnitude. This can only be achieved through substantially
improved automation. Technology for automated stereo compilation from basic research
has advanced greatly to the point that it deserves
serious consideration as a means to augment productivity of stereo processing
[\bakeraij], [\arnbinf], [\arnoldthesis], [\grimson].
Past research has lead to automated techniques which may be adequate for
compilation of relief maps for some areas of smooth terrain
([\mori], [\genneryijcai], [\kelly], [\panton], [\gennery]). These systems have shown
some success at smooth terrain mapping, but the key problem in automation is
recognized as determining stereo correspondence for complex
cultural structures. Cultural areas present many difficulties for stereo
reconstruction not found in terrain mapping. Algorithms for terrain do
not generalize readily to cultural scenes. Current automated compilation
equipments (Bendix AS-11B-X, Gestalt GPM II) have brought increased
automation to terrain mapping, yet still fail to map in from $40\%$ to
$70\%$ of the scene.
These interactive partially-automated stereo compilation systems use
analog (more recently, digital) cross-correlation of small image areas to
track corresponding areas in image pairs. They require
distinctive texture within the area of correlation, breaking track on
sand, concrete, snow, or roofs (ambiguous texture or featureless areas).
They break track in trees, where depth is ill-defined. The systems break track
where there are occlusions, and therefore no corresponding areas in the two images,
notably at buildings and thin objects (poles) where the correlation area
crosses surface discontinuities.

The proposed program of exploratory research focusses on automatic
stereo compilation of cultural sites, yet with techniques applicable also
to terrain compilation.
An objective of our proposed research is to contribute to
techniques for stereo correspondence that automate many processes of
stereo mapping, \_{more accurately}, with \_{greater generality}, and for
\_{more complex material} than is now possible with available manual and
interactive techniques.

\headsection{Mapping Accuracy}

Requirements of fidelity and accuracy in site geometry lead to the
following criteria for stereo reconstruction:
\chart{1)}{The reconstruction should be faithful - it should make few systematic blunders.}
\chart{2)}{Reconstruction should be accurate - it should estimate boundaries of surfaces
and internal detail of surfaces as accurately as fundamental limits allow
for that data.}
\chart{3)}{The reconstruction process should be general - it should reconstruct
accurate maps of complicated cultural sites including curved surfaces.}
\chart{4)}{Developmental stereo compilation programs need not be exceptionally fast, but
they should have reasonably efficient implementation.
It is usual experience that large speedups are
possible by careful revision of code and algorithms.
Current and projected hardware capabilities promise great speedups in
well-understood algorithms.
Such software optimization and implementation in high performance hardware
beyond reasonable efficiency should be performed as a separate stage.}
\chart{5)}{The system should provide for evolutionary introduction of automated
modules. Human interaction should be smoothly integrated where needed.
The system should take as input
a sequence of images with associated guidance information useful for
registration, it should make small corrections to register
images in the sequence by determining modifications to
camera transforms, and perform stereo compilation, with operator
assistance where required.}
\chart{6)}{The system should streamline data entry and modification of existing data
as a result of multiple coverage, to correct errors, to update changes,
and to combine observations for greater coverage and accuracy.}

Although these last two aspects are major benefits of digital stereo, they
involve many total system concerns which are beyond the scope and objectives
of this research. They relate to the introduction of automated mapping to
stereo compilation. This process will be evolutionary, with successful
automated techniques introduced as available. Our research effort is
directed at development of these automated techniques; at the moment
we have neither the facilities nor the expertise to address the
integration issues. Thus parts 5 and 6 concern the follow-on effort, and
relate only peripherally to this proposed research.

\headsection{Generality}

The stereo compilation system will include algorithms for stereo
correspondence to accomplish the following functions:
\chart{1)}{disparity measurement from area correlation;}
\chart{2)}{accurate determination of edges and shapes of
cultural features and discontinuous terrain;}
\chart{3)}{surface interpretation based on monoscopic shape (shape from shape);
integration of monoscopic features and interpretation
with stereoscopic processing of image sequences to accomplish more
effective surface segmentation;
integrated photometric and geometric interpretations (shape from shading);
shape interpretation and determination of unique, common, and familiar
features;}
\chart{4)}{application of 3-D geometric constraints to constrain interpretation;
application of 3-D geometric constraints to smooth and control geometric
reconstructions;}
\chart{5)}{fitting monoscopic and stereoscopic imagery to {\it a priori}
assumptions in order to enhance and sharpen density and detail derived from
images and to provide enhanced accuracy of measurement, based on {\it a priori}
assumptions drawn from feature identifications and
classifications provided by a human photointerpreter;}
\chart{6)}{accurate fill-in of image detail visible only monoscopically in either
photo of a stereo pair, in assigned areas of a 3-D model, where information
limits allow;}
\chart{7)}{application of {\it ``expert''} programs as available to aid human interaction
for semi-automated and automated detection, classification, identification,
and measurement of MC&G (mapping, cartography, and geodesy)
features in a stereo model.}

\headsection{Complexity of Imagery}

Current mapping systems
break track where there is no local correlation (zero signal and where two
images do not correspond) or where the correlation is ambiguous (where the
signal is repetitive). Areas with occlusions, severe terrain breaks,
complex geometric shape (the norm in cultural scenes), sparse detail and
ambiguous texture, are beyond the capability of current mapping systems.
Imagery addressed in this research will exhibit all of these characteristics.
It will include urban areas, military and industrial complexes, and both flat
and mountainous terrain.
Test images for our system will be selected in cooperation with the Defense
Mapping Agency. They should be of realistic difficulty, reflecting typical
rather than worst case circumstances.

\newpage
\setchaptitle{Technical Approach}
\setchapter{\approach}
\chapheading{TECHNICAL APPROACH}

The approach we take in this research stresses integrating multiple
algorithms, utilizing all available information, and studying applicable
constraints and developing algorithms
to exploit each one fully. The proposed research will achieve
performance objectives by introducing new concepts, implementing improved
algorithms, and combining these with available technology. Specifically:

\chart{1)}{Information from sensing and knowledge-base sources will be
integrated in a modularized, rule-based system.}
\chart{2)}{Flexibility in image format handling will be sought through the
consideration of varied imaging devices, such as frame and panoramic
cameras, and perhaps SAR.}
\chart{3)}{A focus on the analysis of cultural scenes is central to our work, and
this necessitates an emphasis on the following issues:}
\chartsml{.1)}{Improved image description is vital. Algorithms are being
developed and implemented for better edge finding, linking,
and segmentation, including curved element segmentation and spline
description.}
\chartsml{.2)}{Features available monocularly provide both discriminatory cues
for stereo correspondence and cues for the inference of surface
shape. Geometric constraints will be derived and
implemented for the monocular interpretation of general surfaces,
including an analysis of inference of shape from shadows and
extrapolation of surfaces into areas unseen in stereo.}
\chartsml{.3)}{Domain knowledge can facilitate both scene interpretation and stereo
compilation. Constraints for special configurations, especially
orthogonal trihedral vertices, vertical and horizontal planes
and cylinders, will be incorporated into the analysis.}
\chartsml{.4)}{Constraints derived by analysis of general viewing situations provide both
methods for determining stereo correspondence where domain context
is not known, and data for inferring the class of domain being analyzed.
Results from [\arnbinf] demonstrate the tight filtering such constraints
can allow. We will continue with our efforts in analyzing
these constraints and incorporating them into the processing.}
\chartsml{.5)}{The principal failing of current automated stereo systems ([\panton],
[\grimson], [\bakerthesis]) lies in the narrow scope they apply in assessing
consistency of their correspondence results. More global criteria
are required. For this, we will be developing extended surface
interpretation, including the use of
interline constraints between epipolar lines based on surface splines.}
\chart{4)}{The success of a digital compilation process depends upon the parameters
and constraints used in the matching; on the evaluation function
employed in measuring the goodness of possible terrain interpretations;
and on the effectiveness and efficiency of the search procedure
used to explore the space of potential interpretations. The role
of constraints
has been mentioned above. An important part of our research task
will be in integrating existing evaluation functions
and search techniques and developing new ones to drive
the correspondence process.}

The techniques to be applied for each of these are described below.

This proposed effort will track carefully the stereo work of MIT, CDC, and others
in the Image Understanding community and outside, to integrate results of
all possible contributors with this work. As close cooperation as possible
will be maintained with these research groups.

\headsection{Sensing and Knowledge-Base Sources of Data}

The data available to the stereo mapping system will include:

\vspssml
\chart{1)}{sequences of digitized monochrome aerial images
from a variety of possible sensors, at various resolutions;}
\chart{2)}{digital terrain elevation data (DTED), contour maps, or other
elevation data, as available;}
\chart{3)}{digital feature analysis data (DFAD), as available;}
\chart{4)}{registration and navigation information which enables computing a
transformation from each
pixel to a ray in earth coordinates (longitude and latitude) and which
allows computing the sun angle for each pixel;}
\chart{5)}{possible other existing information about the area which is not
in those forms, and which can be translated into geometric form.}

\sheadsection{Modular Structure}

One of the principal objectives in this research is to produce a
system with modular structure; one which can incorporate evolving
algorithms for stereo correspondence as they become available. The system
should be capable of incorporating all useful knowledge and information.
It is emphasized that
this will be a research system. Its design and implementation
involve systematization and incorporation of state of the art algorithms
into an advanced, automated stereo compilation system, modularized to
facilitate debugging, design, interfacing and incorporation of new
concepts and algorithms.

\sheadsection{Rule-Base}

Many of the constraints and search procedures to be described
can be implemented as rules for use by a rule-based reasoning
system. This form of implementation allows easy modification and addition
of new rules without having to modify the underlying code. It will
allow a user who does not even understand the language in which the
system is programmed to add constraints and evaluation procedures.

Stanford has demonstrated an advanced rule-based system for
Image Understanding, ACRONYM [\brooksaij], a system which is well suited as the
basis for integrating diverse stereo algorithms.

The rule system used by ACRONYM can be directly applied to stereo
compilation. An ACRONYM rule has three major parts: a
set of pre-conditions, a body, and an advertisement. The rule is
invoked by the rule system when its pre-conditions are true. Then its
body is executed, and its advertisement declares new information which
can lead to the invocation of other rules. Various control strategies
can be used to determine the order of rule invocation.

Work on ACRONYM at Stanford provides a powerful
model for a rule-based system capable of supporting flexible
strategies for stereo compilation, including constraints imposed
for specific sites, combined with a variety of algorithms which
provide varied information [\brooksinterp],[\brooksaij].
Work on a rule-based system under this
proposal will be limited to integration of stereo algorithms in
ACRONYM and developments of ACRONYM which are required by that integration.

\headsection{Flexibility in Imaging Formats}

\sheadsection{Sensor Formats}

We have made the design with non-central projection in mind, to
accommodate sensors such as panoramic cameras in addition to frame
cameras. We have not yet had experience with panoramic cameras; we
expect there will be unanticipated problems to be overcome for this and other camera
geometries.

The problem of accommodating multiple sensor geometries is approached by mapping
both images onto an elevation map of the earth's surface in the form of
$z(x,y)$. The transforms and their inverses provide mapping between an image
and terrain data and the conjugate image.

Analysis will be made of both digital and film imagery. We will work only
with imagery from frame cameras.
The Stanford Stereo Station [\liebesschwartz] will be used for computer-assisted
ground truth and image registration calculations, when this data is
not provided with the imagery. An Optronics film
digitizer will give us softcopy of film imagery.

\headsection{Issues for the Analysis of Cultural Scenes}

\sheadsection{Improved Image Description}

Current research in stereo compilation at Stanford, CDC, and MIT
depend heavily on determining edges effectively.
Edge data is coupled with area correlation or an equivalent, in producing the
total correlation. Effective edge segmentation is thus an
important part of a stereo program. Improved edge segmentation increases
the accuracy of surface boundary estimation and thus determines the
accuracy of geometric reconstruction. For example, it is important that
walls of buildings meet at right angles and are vertical. New techniques
estimate edges with position and orientation accuracy about 4 times better
than Sobel or similar operators and 15 times better than area correlation.
Analytical results at Stanford show techniques for accurate position and
angle estimation, linking edges and determining junctions. Implementation
is proceeding in two directions, first toward an intermediate level system
which will provide short term results with distinct improvements over
current methods and with high accuracy, and second, toward implementation
of the full theory. [\binfordaij] contains an analysis of the problems
confronting an edge-finding and edge-linking system.
Those problems and their solutions appear in the following paragraphs.

\_{Smooth gradients are very common.} Most edge operators fail at finding
boundaries in the presence of these large gradients from smooth shading,
\eg on the fuselage of an aircraft. Gradient operators produce large
continuous areas of spurious edges over the fuselage. Clearly, both
thresholding and operators which depend on the gradient are inadequate in
realistic cases. That includes the Hueckel operator [\hueckel], [\hueckellines],
Sobel operator, Roberts cross [\roberts], O'Gorman's Walsh transform
operator [\ogorman], and Nevatia and Babu's edge finder [\nevatiababu].
This is only one of the major failings of gradient operators and
thresholding. Operators sensitive to the gradient are unacceptable
{\it a priori} in realistic cases. Binford-Horn provided a solution to this by
introducing lateral-inhibition, similar to a second derivative or
Laplacian, \ie determining the contrast relative to neighbors by
subtracting the local weighted average. [\marrpoggiob] provided interesting
arguments about the form of lateral inhibition and appealed to biological
observations.

\_{Localization to sub-pixel accuracy is required to make full use of
data.} Thresholding and gradient operators do not localize edges
accurately, thus they degrade the quality of information they extract.
[\binfordaij] makes a detailed analysis of this point. The essential
problem is that a step edge occurs at the maximum of the gradient after
lateral inhibition. Because pixel data are weighted integrals over
incident intensity the maximum gradient cannot be determined accurately
(the gradient is flat at maximum). However, the second derivative has a
zero at the maximum. The zero crossing can be obtained by estimation to
high accuracy. Binford-Horn and Marr make use of zero crossings to
estimate edges. Binford-Horn used directional edge operators;
[\binfordaij] shows a clean and general way to estimate angle and
transverse position for directional edge operators.

\_{The termination of edges at vertices must be estimated.} [\binfordmiami]
and Binford-Horn [\binfordhorn] showed ways to extrapolate to junctions of
edges. [\binfordaij] outlines improved methods. Additional research
should lead to improved results at vertices.

\_{In typical images, boundary finding programs miss many boundaries that
humans find easily.} These are not problems of higher level knowledge;
informal experiments show that humans find these edges when the rest of
the image is masked. Binford-Horn showed excellent sensitivity by using
directional operators and linking of extended edges. Current research is
extending that capability.

\_{The linking of local boundary elements is an important and poorly
understood problem.} This is one phase of local organization of image
elements. Regions have been implemented as path-connected components
based on four-neighbor or eight-neighbor relations as in the TOPOLOGIST
[\binfordmiami] or [\kelley]. This is an especially simple algorithm.
Industrial vision systems are built using this contour following.
Binford-Horn used directional linking which could bridge gaps of low
signal, without missing termination. Current methods under development at
Stanford offer considerable improvement, and are directed at image
segmentation in curved elements and splines.

\_{Texture is a major problem.} [\herskovits] and Binford-Horn [\binfordhorn]
showed methods for dealing with small amounts of texture in finding
boundaries of regions with uniform texture. No effective ways have yet
been identified for dealing with textured regions and finding their
boundaries. Improvements in edge operators make the time ripe for
research in this area. Although there are promising directions to explore,
capabilities in texture cannot be counted on for the proposed effort.

This research effort will integrate algorithms for edge segmentation from
among those currently available. This will principally involve exploiting
recent results from Stanford [\marimontiu]. If further development of
algorithms along these lines is continued by Stanford, it will be
under other support.

\newpage
\sheadsection{Image Structure Representations}

The segmented description of an image will be composed of the following:

\chart{1)}{\_{Curvels:} Image curve elements will be characterized by a geometric part and
a photometric part. The geometric part contains a unit vector, ($u,v,\theta$),
with continuity links to connected curvels. The photometric part includes
the intensity cross section, contrast, and probability of being a random
fluctuation.}
\chart{2)}{\_{Extended curves:} Extended curves will be characterized by linked curvels,
and by segmented splines. Linked curvels are chains of curvels with
continuity and collinearity relations, described locally by curvature.
Segmented splines have knots where the curve has discontinuity in tangent
vector or discontinuity in curvature. They are continuous between.
With each extended curve is an intensity cross section.}
\chart{3)}{\_{Image structures:} Image structures are extended curves linked by
structural relations which include intersection, collinearity across gaps,
parallelism, symmetry, co-termination, and textural groupings.
These structures are related in a curve, junction, region graph.
Junctions relate intersecting curves; they contain estimates
for intersections and terminations. Junctions contain links to surface
interpretations obtained by inference. Broken extended curves have links
between connected curves, across gaps, as in the dotted center line of a road.
Regions are also described by geometric and photometric properties,
by their boundaries, and neighbors with inclusion relations,
texture patterns, reflectance estimates, and shading. Regions contain
links to surface interpretations.}

Image pixel data will be related to an earth coordinate elevation system
$z(x,y)$, with transforms and their reverses mapping between elevations and
images.
In this way, any available initial topographic information can
be used as a starting point for search, limiting computation. This
surface representation aids in resolution of ambiguity. Unambiguous
elements on the surface serve to anchor the surface. The entire surface
must be consistent. Ambiguous areas with repetitive textures or
features are fixed by their relation to other
areas. In the same way, since there is only a single surface, the problem
of unique correspondences between images can be dealt with.

The problem of sparse data requires interpolation and extrapolation. It
will be dealt with by defining the elevation map by a piecewise smooth
surface representation, a spline with discontinuities in depth and tangent
plane. Some seams of the spline correspond to ridges, troughs, and steps
in the elevation map of the surface; at other seams, the surface will be
smooth. At many seams where the surface is smooth, surface patches may be
merged and seams removed. Seams of the spline will be chosen along image
discontinuities. Positions of patches of the surface will be included.
The spline will be implicit, \ie it will be defined by the seams and
interpolation by quadratic variation [Brady and Horn, MIT memorandum].
Note that interpolation will be done without bridging
discontinuities, which will make a significant improvement over
[\grimson]. Associated with elements of the surface model are limits and
errors associated with accuracy of terrain elevation data, camera model,
and other measurements.
\sheadsection{Monocular Interpretation of General Surfaces}

\ssheadsection{Occlusion and Perspective Cues}

The depth structure of an image
can be identified qualitatively and in many cases quantitatively from
the viewing of single images. Human interpreters utilize this capability
in identifying features in single views and matching these features across
views. There are rich results to be gained from incorporating use of monocular
cues to surface shape.

New theoretical results outline a theory of interpretation of
edges and intensities in images, a theory which promises to include curved surfaces
with surface markings and shadows, paper (non-solids), and wires
[\binfordaij]. These results appear to have definite application
for stereo and photointerpretation.
Results in inferring surfaces from images will be extended to stereo
coincidence.
The approach begins by characterizing
surfaces by their edges and limbs, \ie apparent edges.
It uses very general assumptions of general source position and
general observer position to identify evidence concerning which edges and
surfaces intersect in space, and to identify evidence on whether surfaces
are smooth at apparent edges.
In absence of other information, if curves appear to intersect
in an image, and if their inverse image is sufficiently constrained,
assume that they are images of curves which intersect in space.
The inverse image of a curve in an image is the set of rays in space (a surface)
which project onto the curve in the image.
Where curves are smooth at junctions, assume that the surface is smooth.
Where curves have breaks, assume that the surface has a crease, \ie discontinuous
tangent plane. Where shadows and surface
markings cross true edges, the surface is unaffected at the true edge, the image
of the edge is smooth.
Thus the system infers that the surface is smooth along the shadow or
surface mark. These results are much more general than previous
theoretical studies of interpretation of line drawings.
These results will be extended greatly.

\ssheadsection{Inference From Shadows}

[\lowe] exploit some of these assumptions in interpreting
an aerial photograph of an aircraft.
First, occlusion cues are derived which provide
a layered relative depth model: the fuselage is above the wing which
is above the engine pod which is above the shadow which is on the ground;
the shadow is over a white line which must be on a smooth surface.
The system does not know that the surfaces are wings, engines pods, etc.
The techniques are quite general, they do not depend at all on knowing
the objects in question, and they use cues which are universally available.
Taking the matching of junctions of shadow images
with junctions of edges which cast the shadows by projection
along rays from the sun (coincidence assumption), the system
determines heights of edges which cast shadows by triangulation with the
known sun position.

Where shadow information is available, it enables accurate measurement
of three-space positions of many edges in the scene, from which other
dimensions can be inferred. Shadows provide information equivalent to
stereo. Use of shadow information is an important capability for
geometric reconstruction for photointerpretation.

Shadow information can be used to check for consistency in stereo results, to
infer geometrical detail for portions of a scene that are visible in only
a single image, and to infer the shape of textureless surfaces that are
encountered in cultural objects. Usually in mapping sequences, the source
position is known and shadows can be used in a straightforward way. The
position of the sun relative to the camera can be determined from simple
navigation data. Where the sun's position is unknown, it appears
promising that the sun position can be obtained automatically.

In cases in which the source position is known or inferred, shadows can be
identified by correspondence with object vertices along rays from the sun.
Shadowed regions are darker than the unshadowed region. If the sun is the
source of illumination, then the illumination ratio between shadow and
sunlight will be the ratio of diffuse to direct sunlight, which is roughly
equal across shadow boundaries within an image (but not constant from
image to image because of changes in diffuse illumination caused by
weather variations).

There are more global conditions. Where shadow boundaries cross edges,
the shadow image boundary will have position and/or tangent
discontinuities corresponding to position and/or tangent discontinuities
across the surface boundary. Where shadows cross surface markings without
edges, there are no breaks in the shadow; the junction is an X.
Conversely, if there is a position or tangent discontinuity in a shadow
image, there is a position or tangent discontinuity in the surface on
which it lies.

If reflectivities across a surface boundary are $r↓1$ and $r↓2$ and the
illumination ratio across a shadow boundary is $r$, then the ratio of image
intensities across the image boundary on one side of the shadow is
$r↓1\over{r↓2}$, and on the other side of the shadow image the ratio is
the same, ${r↓1r}\over{r↓2r}$. In many cases because dynamic
range of film or digitizer is limited, the computer cannot {\it ``see''} into
shadows, as in the example above. A part of the available information is
lost but typically there remains enough for successful interpretation, as
in [\lowe].

[\liebes] has developed a general formulation of the principles of the
perspective geometry of shadow formation for local and remote point
sources and extended sources, and has applied this formulation to a
variety of basic geometrical configurations. It is
proposed to follow this investigation of shadows to further results which
appear to be directly ahead.

\ssheadsection{Use of Monocular Cues}

These monocular interpretations are not a substitute for
stereo, but they can aid greatly in stereo correspondence.
One problem with stereo systems has been that they don't use shape as humans
do and they don't use context.
There are two classes of monocular cues described above: first
depth ordering of surfaces from occlusion cues, combined with object
segmentation; second, quantitative depth models obtained from analysis of
shadows.
Correspondence is enormously simplified if there is a qualitative
ordering of surfaces, and simplified more still if there is a quantitative
model. That is, matching top surfaces with top surfaces, and ground surfaces
with ground surfaces leaves a small search space.

Equally, such inference methods can be used to identify when special
cases are applicable, \ie when surfaces are planar. Specialized knowledge,
for example, that of OTV's, may then be exploited for the analysis.

\sheadsection{Geometric Constraints for Special Surfaces}

Separate, additional funding will be sought for research in analysis and
implementation of geometric constraints for special surfaces. If such
support is found, research in this proposal will integrate the results
into an overall system. Otherwise, if no other support is found, a
portion of the proposed research will be devoted to a small effort of analysis and
implementation of a subset of familiarity constraints for special surfaces. These
constraints are important for stereo compilation of cultural sites.
[\liebes] has achieved a number of preliminary results on the use of
monocular cues for special surfaces in stereo matching for cube corners,
orthogonal trihedral vertices (OTV's), in nadir-oriented stereo photos.
Given a cube corner in one image, the rules uniquely specify
the orientation of the vertex, its appearance in the matching view, and
its significance in the structure under observation.
It is proposed to push this area further, with the plan of analyzing
cases for horizontal and vertical planes and cylinders and implementing
algorithms for them.

The objective of determining a depth map $z(x,y)$ can be aided by symbolic
description of portions of the scene. In some portions of an image,
little or no depth information can be obtained, \eg on water, snow,
concrete, gravel roofs, uniform metal surfaces, etc. The best information
about surface height in such cases can be obtained by fitting a surface
satisfying boundary conditions with symbolic constraints, \eg planarity.
A lake is an extreme case. The condition applies to the height of the
water at the lake boundaries with the constraint that the water surface is
horizontal. On a horizontal circular cylinder, the surface has boundary
conditions that the image position of the boundaries of the visible
surface is known and the surface tangent is known to be along the line of
sight. Thus, symbolic constraints provide ways of translating familiarity
cues into improved measurements.

The direction of the gravity vector is perhaps the single most important
determinant in the alignment of architectural structures. Walls of
buildings and their edges tend to be vertical. Most of the visible area
in an aerial photograph of a building is the roof. Roofs are usually
composed of planar sections, many horizontal. Architectural structures
contain many right angles and orthogonal trihedral vertices, at
intersections of walls and roofs of concrete buildings, at door ways,
windows, and the like. These elements are usually aligned with gravity.
Pipe lines and large cylindrical structures such as storage tanks tend to
have their axes either horizontally or vertically aligned.

Vertical and horizontal surfaces and edges occur with such frequency in
cultural objects that it is valuable to work out special case constraints
for these construction elements. It is intended to address the important
special cases of plane and cylindrical structural elements, especially
right parallelipipeds and right circular cylinders aligned with gravity,
vertical and horizontal surfaces. To ignore these capabilities for use of
special structures is throw away valuable information. Liebes is working
to quantify these special structure constraints.

Consider the case of orthogonal trihedral vertices (OTVs), which appear as
internal and external corners of right parallelipipeds. The approach is
based upon the use of projective invariants. Images of OTVs show
projective distortion depending upon their orientation relative to the
camera. The study of projective transformations and stereoscopic imagery
has yielded valuable formulations involving projective invariants,
coordinate representations, and stereo edge element organization and
analysis. These formulations have been applied to projective invariants
of OTVs, using the locations of the vanishing points for their edges, or
equivalently their surface normals. In an application to the important
special case of nadir-oriented stereo cameras, a simple set of projection
and visibility rules have been found that uniquely label the corners.
Given an OTV in one image, the rules specify the quantitative appearance
of the corresponding OTV in the conjugate image, as a function of relative
displacement along the associated epipolar line. The nadir viewing case
extends directly to oblique viewing configurations.

The simplicity of the rule formulation in nadir viewing arises from the
facts that the vertical edge vanishing point coincides with the nadir
point, and both of the remaining OTV edge vanishing points are oriented at
right angles to one another at infinite distance parallel to the film
plane. In the more general oblique case, all three vanishing points are
at finite distance from one another in the film plane. The rules in the
latter circumstance more explicitly utilize the projective relationship of
the edges of the sixteen different kinds of corners to the vanishing
points. Liebes has demonstrated that elements of the approach extend to
the case of vertical cylinders with arbitrary polygonal cross section.

\sheadsection{Geometric Constraints for General Surfaces}

[\juleszbino] has shown that stereo vision is possible in random dot
stereograms without any monocular cues. The problem of stereo
correspondence is sometimes identified with correspondence without
monocular cues, which is really a subproblem that accounts for only a
small part of stereo correspondence capability. Note that fusion of
random dot stereograms is slow; it requires seconds to trace boundaries
accurately for adept viewers. Also, examples used in random dot stereo
are simple compared to real world scenes, \eg a single square against a
background; a smooth saddle-shaped object or a layered wedding cake are
among the most complex. Random dot stereo is a real and significant
capability, but it ignores all monocular cues which humans use extensively
and which are necessary for adequate stereo performance. This section
deals with stereo constraints which primarily depend on features observable in
single images, but which do not depend on familiarity.

Recent Stanford results show geometric constraints on
stereo correspondence of edges and surfaces, and constraints on occlusion
[\arnbinf]. New theoretical results outline a theory of interpretation of
line drawings which promises to include curved surfaces
with surface markings and shadows, paper (non-solids), and wires
[\binfordaij]. These results suggest general methods for inferring whether
surfaces are planar, or if not planar, for inferring their curvature.

\ssheadsection{Surfaces and Complexity}

Julesz has focussed
on the problem of local ambiguity of features in stereo correspondence.
Refer to the direction normal to epipolar planes as the elevation and the
direction within an epipolar plane as the azimuth.
The classic problem of ambiguity in stereo matching is depicted by
an epipolar plane with N identical points seen from two views.
There are $N↑2$ potential matches corresponding to all possible
intersections of pairs of rays. A surface interpretation reduces this
ambiguity in the following way.
A single underlying surface
generates a curve through the true match points.
In natural situations, humans deal with surfaces and not isolated, identical
points. If there were no obscurations and
no points without correspondence, possible solutions would be limited
to 2N-1 such curves. Since obscurations and non-correspondences are
few and limits are known for heights of structures, the problem is
believed to have difficulty which is a small multiple of the case without
obscurations, \ie linear in n.
Transitions on the diagram correspond to surfaces. Transitions along
rays from one camera or another correspond to surfaces along the
line of sight, \ie pairs of features coincident by accident in one view.
They can be ignored unless
there is evidence to the contrary (principle of general observer position).

\ssheadsection{Constraints within Epipolar Planes}

Research will be extended in determining constraints within
epipolar lines and between epipolar lines.
These constraints apply within conjugate epipolar lines, which are
intersections with the two image planes of a plane through the two camera centers
of a stereo pair.
At Stanford, Arnold and Binford show geometric constraints on occlusion
and show that there are quantitative
constraints on corresponding edges and surfaces ([\arnold], [\arnbinf]).
A necessary condition for occlusion was derived.
If occlusion is assumed, the portion of the surface which is obscured in one
view is assumed to lie on the surface which is more distant.
Extrapolation of obscured surfaces will be analyzed and implemented, using
the occlusion constraint.

\_{Constraint A:} The constraint on correspondence of single edges is a
consequence of the observation that if two views of an edge differ greatly
in angle, the edge must lie nearly along the line of sight in one view.
Corresponding edges appear to have similar angles in two views; randomly
mismatched edges have quite different angles. Mismatches are peaked along
the lines of sight. For edges randomly distributed in direction on the
unit sphere, most edges will appear to have nearly the same angles in two
views. The distribution of differences of angles in two views has zero
mean and a narrow width which depends on the stereo baseline.
This constraint is restrictive for mapping sequences of aerial photos, it is
much more restrictive for human stereo. The distribution of differences
of angles in two views has a half width at half maximum for true matches
of 3 degrees for human stereo at 1 meter. These results have considerable
utility for stereo correspondence for aerial photos. The results also
have consequences for biological stereo vision which are consistent with
experiment. If randomly distributed edges are mismatched, the false matches
have a distribution with two peaks along the lines of sight to the two
cameras.

\_{Constraint B:} If surfaces correspond, they tend to have the same projected
length along an epipolar line in two views. Surfaces with projected
length which are very different in two views lie nearly along one of the
lines of sight. For surfaces randomly oriented in space, projected
lengths are nearly equal for most surfaces. Randomly mismatched intervals
correspond to surfaces which are peaked along the lines of sights from the
two cameras. Again, the constraint is useful for mapping
sequences with wide stereo baseline; the constraint is very tight for
human stereo. There is an additional constraint on torsion of pairs of
edges at boundaries of a surface. Smooth surfaces have low torsion.
Constraints on surfaces are equivalent to constraints on pairs of edges.
Thus, constraint A corresponds to single edges and constraint B
corresponds to pairs of edges. There are also constraints on triplets of
edges, corresponding to pairs of surfaces. For example, mismatches tend
to have angles of zero or pi between adjacent surfaces. For natural
scenes, angles of pi between adjacent surfaces are unusual; angles near
zero are frequent. For manmade constructions, adjacent surfaces tend to
be perpendicular. These distributions will be quantified.

\ssheadsection{Scene Modeling}

Consideration of typical cultural sites and terrain indicate that
$95\%$ of the surface visible in aerial photos is approximately horizontal.
These scene modeling results indicate that edge and surface constraints are
much more useful with typical scenes than with randomly oriented edges
and surfaces.
A worst case is residential housing areas with $30\%$ of the surface
covered with slanted roofs. Even in this case,
hip roofs are usually bounded by parallel horizontal edges.
Slanted roofs are close enough to horizontal for edge and surface constraints
to be useful.
Where horizontal surfaces and edges exist, powerful special case constraints
can be applied, as treated in the previous section
under the category of geometric constraints for special configurations.
In aerial photographs, vertical edges tend to be oriented near the
lines of sight, thus they require special treatment.
Vertical edges in aerial photographs typically appear short; measurement
of their direction is feasible with sophisticated edge operators.
For modeling expected edge and surface distributions, the
model of a uniform distribution on the unit
sphere is modified by superimposing a peak along the
vertical and a band along the horizontal plane for cultural objects and the
ground surface. These results of image modeling will be integrated in a
quantification of probability of correspondences.

\ssheadsection{Constraints for Extended Edges}

Much tighter constraints come from working with extended edges.
First, edges can be extended uniquely between epipolar lines.
By contrast, within epipolar lines, an
edge in one view may correspond to many edges in the other view.
Continuity constraints on edges
were implemented in [\arnold]. Similarly, tangents of surfaces
vary slowly across epipolar lines.
Second, since corresponding points must lie in the same epipolar plane,
\ie must have the same elevation, endpoints of edges must have the same
elevation, except where there are occlusions.
Obscurations can be inferred by methods
described below for inferring surfaces from images.
There will be slight variation of elevation for endpoints
caused by small errors in registration.
Small registration errors will be determined and
corrected by modifying parameters of registration and calibration.
The probability of false correspondence of terminations and
the corresponding constraint function will be quantified.
These results will be extended.

\sheadsection{Extended Surface Interpretations}

Proposed research will seek search procedures for interline constraints
among multiple epipolar lines, using an improved evaluation function
including interline constraints. As described above, it will use extended
edges and employ algorithms for inference of surfaces. CDC used a dynamic
programming algorithm for finding the best solution for their evaluation
function within a single epipolar line. A generalization to multiple
epipolar lines has been made at Stanford, but the dynamic programming
procedure requires prohibitive computation cost when generalized to
include interline constraints. We have found no effective equivalent
dynamic programming search method for multiple epipolar lines, however we
can formulate it as a graph search. We have good arguments that the
computational complexity of the interline constraint search problem is
roughly the same as the complexity of the search within a single epipolar
plane. Approaches to a solution are under investigation. Dynamic
programming approaches used at CDC and Stanford have the failing that they
do not accommodate overhang at wires or around buildings viewed from
low-flying aircraft, overhangs which cause order reversals between images.
Approaches will be sought to generalize search methods to include order
reversals.

CDC has found it useful to go beyond single epipolar lines in constructing
adequate geometric models from stereo correspondence. Their approach has
been to use special case geometries in a syntax-directed matching,
particularly L-shaped flat roofs. CDC's current approach appears to allow
considerable meshing of their methods with those of this proposal.

\headsection{The Evaluation Function and Search Procedure}

The success of a digital compilation process depends upon the relevance
of the parameters
and constraints used in the matching; on the quality of the evaluation function
employed in measuring the goodness of possible terrain interpretations;
and on the effectiveness and efficiency of the search procedure
used to explore the space of potential interpretations. The role
of constraints
has been discussed above. An important part of our research task
will be in selecting the evaluation function and search procedure for
the compilation process.

\sheadsection{Considerations for an Evaluation Function}

The distinction between constraints and evaluation functions is that we
may have constraints and not know how to combine them or how to relate
them to likelihoods. Evaluation functions are determined by
constraints, measurement errors, distributions of measurements of
properties of features, and their interpretation. Search procedures are
distinct from evaluation functions in that we may know how to tell a
solution if we find one, but we may not know how to find one.

An evaluation function must reflect faithfully the relations between
two images which result from the physical conditions
of different perspective views. If not, it will include major biases
toward faulty interpretations. Consequences of this observation are spelled
out below. It is worth note that an evaluation function which has purely
a penalty for differences between two images will tend to produce interpretations
which minimize overlap between the two images. If not constrained by prior
geometric knowledge, a pure penalty, \eg least squares, would produce interpretations
without any overlap at all. This is quite unsatisfactory.

It is useful to consider two parts of an evaluation function, a photometric
part and a geometric part.

\sheadsection{Photometric Evaluation Function}

Consider photometric transformations which relate a surface to two images.
Photometric differences between images are the rule, rather than the
exception. Photometric differences arise from sensor variations and
from variation of surface reflectivity with viewing angle.

Sensor differences include overall sensor scale factors and
local variations over the sensor face. Some examples are:
variations in sensitivity and processing of film,
inhomogeneity of film, pixel to pixel variations in response of
solid-state sensors. Sensor differences can largely be compensated for.

Reflectivities of real surfaces vary with viewing angle.
There is no reason that surfaces viewed from as much as 60 degrees apart
should appear similar. The implicit assumption which goes unexamined in
stereo correspondence is that intensities in the two images are equal,
or equal after a scaling. It is worthwhile to examine that assumption.

It can be shown analytically that
for matte surfaces, image intensity at a point is independent of viewpoint.
It turns out that many natural surfaces are matte, and that many
painted surfaces and building materials have matte reflectivity.
A reasonable approximation is thus that intensities of corresponding areas of
stereo images are identical. This approximation is better for small
angle stereo than for aerial mapping which uses large baselines.
Approximate viewpoint independence of image intensities is an underlying
basis for stereo correspondence.
If there were wide variations in reflectances
with viewpoint, it would not be valuable to infer reflectance.
This suggests experiments to quantify human interpretations for stereo
match as a function of photometric variation, \ie contrast reversal.
In the visible spectrum, some surfaces are specular
reflectors, like water. They may appear darker than the shore from
one viewpoint, lighter from another, \ie exhibit contrast reversal.
Specular reflection is much more common for SAR images.
Provided that the evaluation function weights correspondences appropriately,
the assumption of approximate photometric invariance provides a
useful quantitative discrimination of true correspondences
versus random matches, even if only some surfaces are invariant.

Photometric differences can arise in other ways, \ie actual changes
like clouds and sun angle differences. Typical mapping sequences
cover only a few seconds. Thus, these time-dependent effects are
probably not very important.

This suggests the basis for deriving a quantitative photometric evaluation
function. Compute the overall likelihood as the product of likelihoods for
each region pair which are assumed to correspond in a particular interpretation.
Individual likelihoods indicate how likely it is that a random pair would
match intensities as well as observed or better.
The effect of the photometric component of the evaluation function is to
quantify the concept of the least random match based on approximate
photometric invariance.

\ignore{Within the paradigm of finding the least random match, it is not
necessary to talk about true matches. We know something about
true matches, however.
To estimate the likelihood of a true correspondence, we need an estimate
of the distribution of differences for true matches. The distribution
can be recorded from history for stereo mapping which has already been
done (automatic or manual). The least random photometric match is
adequate to accumulate the history.
It is useful to exclude from the distributions those materials which
are known to have large differences resulting from specular reflection,
especially water and bare metals.
This sharpens our discrimination for surfaces which do correspond.
That is, we are assuming that there are a subset of surfaces which
correspond better than others, and that we benefit by finding those
which match especially well.
We can estimate that the number of true matches equals the number of edges
within the overlap region of the two images, assuming that most have a match.
If we have initial
knowledge of registration of the images and digital terrain elevation
maps, both to typical accuracy for DMA applications, then we can calculate
the overlap region.
}

An additional element which affects photometric and geometric evaluation
is the failure of edge operators, \ie missing and spurious edge elements
which change both geometric interpretation and photometric measurements.
Spurious edges can be a serious problem at the local level of epipolar lines;
it is less a problem for extended lines because any spurious edge
elements are unlikely to satisfy constraints for matching along the
whole of the edge. For missing edges, the same semantic interpretation
of differences is useful. Given the contrast and surface material
at an edge, under the viewing conditions is it reasonable that the
proposed edge would be missed?

\sheadsection{Geometric Evaluation Function}

Consider first geometric invariants. One of these is the
invariance of the elevation coordinate in stereo coordinates, normal to
the direction of epipolar lines. Two views of
the same point must have the same elevation coordinate. If the combined
effect of measurement errors, \ie point scatter, and uncertainty in the
registration parameters combine to make an uncertainty in the elevation
coordinate of $\epsilon$, the likelihood of another point corresponding is given
by the distribution function of elevation differences, typically a gaussian
with standard deviation ${(\epsilon↓1↑2+\epsilon↓2↑2)}↑{1\over 2}$.
A crude estimate of the probability of false match of a curve is the product
of probabilities of each random match at each end, ($\epsilon/H$)($\epsilon/H$), where
H is the picture height normal to the direction of epipolar lines.
This assumes that there are no obscurations,
and that distributions of end points of curves are random and independent.
Obviously, this is a coarse estimate, but useful now.
In real pictures, buildings are arranged regularly, and flight paths
may be aligned with building arrangements, so that there may
be systematic concentrations which change these probabilities. They
may make coincidences more likely by being exactly aligned with a building
grid, or make coincidences less likely by being slightly out of alignment.
If we use the distribution of endpoints of curves for the actual pictures,
probability estimates of random coincidences will be adequate.

If combined elevation error $\epsilon$ is 5 pixels and the picture is 1000 lines,
then the probability of false match is 1/800. That is quite small.
But we must introduce obscuration. Any obscured edge can match
geometrically with any edge which is consistent with limits on the length
of the obscured edge. If we assume that $15\%$ of edges are obscured in
aerial photos (we guess that the fraction is less), then this is the major
contributor to ambiguity of curve matching. Roughly $10\%$ of those obscured
curves are estimated to be consistent with geometrical conditions for matches.
The availability of crude terrain elevation data further restricts the
potential ambiguity, as described below.
As we shall see, this ambiguity is much reduced by monocular interpretation.

Vertical edges appear vertical in both views. The component of the
evaluation function for verticals is the likelihood of random coincidence
for a match to the vertical hypothesis that good or better. The same
type of likelihood is calculated for horizontal assumptions, and for
vertical walls meeting at right angles. Cube corners must satisfy
an inequality which can be turned into a weak likelihood of satisfying
the inequality for a triple of edges. A horizontal surface must be planar,
of course. For vertical photography, projections of horizontal surfaces
are identical in any views.

Results from inference of surface structure from monocular cues have
similar evaluation. Inference rules include shadows, surface markings,
and limbs. Interpretation of an image curve as a limb implies a surface
constraint. Silhouettes in two views of a cylinder do not correspond,
only approximately. The rigorous constraint is that the surface is
tangent to the viewing ray at the silhouette from two different viewpoints.
This provides a solution to what has been a difficult problem.
Shadows provide actual 3-D measurements of edge locations and orientations
[\lowe].
We believe that there are transition rules for arbitrary vertices which
specify how a corner can change from one view to the other, given an inferred
interpretation for the corner. We expect to analyze and incorporate these
transition rules.

\sheadsection{Considerations for a Search Procedure}

Given an evaluation function, a search procedure determines the value of
the function on all possible solutions or a subset of possible solutions.
Systems at Kiev [\gimel], CDC [\hendersonb], and Stanford ([\bakerijvanc],
[\arnoldthesis]) used the Viterbi algorithm, a dynamic
programming algorithm, to find the
optimal stereo match of planar surfaces or edges along a single epipolar line.
Optimal is with respect to their choice of evaluation function. The
evaluation function may not include all constraints, or may not represent
constraints well. Matches on different epipolar lines are independent.
To incorporate consistency on adjacent epipolar lines requires an
extension of dynamic programming to three dimensions. A straightforward
extension has been derived, but it is not feasible computationally.
There is good reason to think that efficient dynamic programming solutions
exist which are feasible computationally, but no solution has been found yet.

The solution adopted is intended to make use of local interpretations which
are often correct but the solution must be tolerant of mistakes.
The procedure is like relaxation, \ie form local nuclei and use the
evaluation function to quantify local consistency. This procedure is
iterated.

Efficiency is a major consideration. Reduction of ambiguity is a major
mechanism for efficiency. Making use of terrain elevation data where available,
can limit ambiguity.

Terrain elevation data are available for large areas of the earth's surface
with a contour interval of about 50 feet. These terrain data are crude
and probably contain many mistakes. For appropriately chosen standard
deviation, elevation data presumably contain a large fraction of realistic
estimates of elevation. Our approach is to make use of relations which
are satisfied for subsets of the data, but not satisfied uniformly,
and to be insensitive to those data for which relations are not satisfied.
An example was shown above for approximate photometric invariance.
In this case, large areas of the earth's surface are known to a standard
deviation of 5m corresponding to an interval bound of 50 feet. The particular
value is useful for discussion below. Better estimates will be made.
Typical DMA pictures have a scale of 40,000:1, which for a pixel of 10 microns
on film, corresponds to about .5m on the ground. For typical mapping coverage,
the horizontal uncertainty on the ground is about equal to the vertical
uncertainty. Thus, an image element corresponds to similar elements within
a range of about plus or minus 10 pixels in the paired image.
If we assume that typical small buildings are about 100 feet on
a side, with spacing of 50-100 feet, there will not be many ambiguous matches
for image curves, much less than one spurious match on average.
Surface interpretation procedures permit classifying edges and corners
as, \eg the SE, SW, NW, NE corners of a rectangular building. In that case,
only corresponding elements could match, \eg SW corner with SW corner. They
would be separated by 150 feet or so, not consistent with the terrain
elevation data which is uncertain to 5m.

These arguments indicate that the level of ambiguity of correspondence of
extended curves and junctions will be very low. The approach of starting
with a coarse terrain model and refining the model with stereo correspondences
appears promising. Relations and invariants true for a subset of the data can
be used in the following way. Find matches based on these relations or
invariants which are consistent with other constraints. These matches will
usually be satisfied for the appropriate subset, and not usually satisfied
by unrelated elements. Mistakes in elevation data would cause some relations
to be missed, however they would likely be found from consistency with
adjacent elements. Good matches will provide nuclei for correct interpretation.

The search paradigm assumes that there is great ambiguity in matching.
We assume that the opposite is true.
In working with extended curves, junctions, and regions, the level
of ambiguity is low. That is, there are relatively few spurious matches
for each element. Thus, it is possible to lessen the computational
burden by finding local good matches as nuclei. They can be found by
testing in parallel the few matches which are possible for each of
a small set of elements which are chosen to be few and distinctive.
That small set includes extended curves, junctions, regions, and
interpretations from inference rules.

The process starts in many places in parallel.
some of them will have wrong initial interpretations, many will be correct.
Those wrong starting interpretations will not be consistent with neighbors,
so wrong interpretations will not extend far. Correct starting interpretations
will extend to those places with wrong initial interpretations.

In absence of other evidence,
if a line in the image appears vertical, assume that it is vertical.
That limits the search for corresponding lines to those that appear
vertical in the other view. That does not commit to a vertical interpretation,
but it does take advantage of verticals where they exist.

\sheadsection{Search Strategies}

Coarse-to-fine correspondence strategies and successive refinement
matching will be incorporated into the processing for
increased noise immunity and computational efficiency.

Coarse-to-fine correspondence strategies implement efficient matching.
Several forms of coarse-to-fine correspondence matching have been developed
and implemented in stereo work here at Stanford, and elsewhere
([\moravec], [\grimson], [\bakerijvanc]). Where
appropriate, they will be incorporated into this effort. No further
development in such strategies is anticipated, except in response to
opportunities for system improvement.

Successive refinement matching works from parts of an image pair
where matching is simple (large areas, distinctive features) to
parts of images which are more difficult to match (boundaries, wires, poles,
small structures). This approach parallels and complements
coarse-to-fine correspondence.

\newpage
\setchapter{\timetable}
\setchaptitle{Timetable}
\chapheading{TIMETABLE}

\quote{
Month

-------- October 1982 -------------------------------------------------------
1

3 Selection of appropriate test images ongoing for first 6 months
Research in algorithms ongoing
4

6 Design Review Research System
Report
-------- April 1983 ---------------------------------------------------
7 Research in algorithms ongoing

10 Preliminary Functional Specification for Follow-On Development

11 Research in algorithms ongoing
Design Review Research System

12 Review of Preliminary Functional Specifications for Follow-on
Report
-------- October 1983 -------------------------------------------------------
13

14 Implementation Research System ongoing
Consult on Follow-On design

15 Revised Functional Specification Complete
}
\newpage
\quote{
Month

16 Complete Initial Implementation of
Research System

17 Implementation Review Research System

18
Report
-------- April 1984 ---------------------------------------------------
19

20 Research in algorithms ongoing
Participate in design review of Follow-On Development effort
21

22 Integration of new algorithms ongoing

23 Complete final implementation of
Research System

24 Analysis and testing ongoing
-------- October 1984 -------------------------------------------------------

26 Final report; Termination Project
}

\newpage
\setchapter{\deliverables}
\setchaptitle{Deliverables}
\chapheading{DELIVERABLES}

Deliverables include:
\chart{1)}{Evaluation of applicable stereo reconstruction techniques;}
\chart{2)}{Algorithms and analyses used for direct input to a potential
follow-on development effort as they become available;}
\chart{3)}{A final report containing:}

\bullit{accomplishments of the research;}
\bullit{algorithms and science base for stereo compilation;}
\bullit{evaluation of technology at the time of completion including:}
\sbullit{analysis of fundamental limits of performance of algorithms;}
\sbullit{projections for technology and performance; }
\sbullit{and recommendations for future action.}

\newpage
\setchapter{\facilities}
\setchaptitle{Facilities}
\chapheading{FACILITIES}

The Robotics group of the Stanford Artificial Intelligence group has
substantial facilities including a large VAX 11/780 system and shares
facilities of the Computer Science department, including SAIL and SCORE
time-sharing systems, and the Ethernet. Details follow. In robotics,
researchers have a large base system on which to build. A background of
working robotics laboratory systems for a decade means that there is a
large background of worked examples which students can use and modify.

Facilities for integrated circuits in the Center for Integrated Systems
are outstanding. Design and fabrication facilities are already excellent.
Fast-turnaround fab facilities will be available in the near future.
Facilities for sensor development in the Integrated Circuits Lab and in
Applied Physics are excellent. The Guidance and Control Laboratory has
fine facilities.

It is often overlooked, but the software environment is as important as
equipment. The style of using the hardware and software environment is
also important; Stanford benefits from a decade of developing and using
top-quality systems, and from being in the Silicon Valley, a focus of high
technology development.

A component of the proposed program is to augment the experimental
laboratories and technical support (using student staff) where necessary
to accomodate increased use of facilities to support education and
training. It is important to utilize time of students and visiting staff
effectively by providing adequate facilities.

\headsection{VISION AND ROBOTICS EXPERIMENTAL SYSTEM}

Two Unimate 600 (PUMA) arms and a pair of Stanford Scheinman arms are used
for experiments in assembly. The Stanford arms were designed and built
here, the first in 1970, the second in 1972. One of the PUMA arms has a
servo-controlled hand and wrist force sensor. The other will have a
similar hand and force sensor soon. One Stanford arm has a wrist force
sensor which measures three components of force and three components of
torque. It measures 300gm weight to within 5 gms. There is a simple
program which automatically calibrates the wrist. Two pairs of force
sensing fingers have been built, the first measures one component of
force, the other measures three components. Their sensitivity is .1 gm or
so. An IBM RS-1 robot and IBM 7335 robot are expected in 1982. A GCA
robot built by Dainichi Kiko is expected in 1982.

The arms are located in an assembly workstation which has a flexible and
accurate setup capability, with accurate locating holes and screw holes
for clamps. A computer controlled vise (with position feedback) and
screwdriver (with velocity feedback) are used for assemblies. More
assembly tools are planned, including a moving belt.

The group now uses the SAIL computer system primarily (see below). A VAX
11/780 has just been delivered. It has 8 M bytes of memory, two RP07
fixed media disks with 512 Mbytes of storage each, an REM-03 67 Mbyte disk
with removable media, and a TU-78 tape drive with 6250 bpi. A PDP11/45
with 124k wds of memory controls the arms. It is interfaced to the KL10
I/O bus and the Ethernet. A PDP11/60 is being interfaced for real-time
arm control, with the PDP11/45 running the AL system. The group has one
SUN workstation (see below). It will acquire others and share other SUN
workstations with the computer science department. The group will share
LISP machines with the department, but the number does not appear to be
adequate. It has one Imagen laser printer with capability for arbitrary
font and book-quality print.

\JA pair of solid state cameras, GE TN2500, are used for stereo vision
and other vision at the robotics workstation.
There is also a Fairchild CCD camera with resolution 380x480.
A vision module, Machine Intelligence VS-100, is connected to a
GE TN2500 solid state camera together with a light table. It analyzes
silhouettes of objects for classification and location, and is coupled
to the AL system.
There is an Optronics C4500 digitizer for digitizing color photographs.\.

\JA Grinnell GMR-270 display system with 512x512 resolution with 36 bits per
pixel provides gray scale and graphics. It
is shared among users by a video switch. The system accomodates various combinations
of users: four users with 9 bit black and white
images, two users with two 9 bit channels each, or one user with 3
color channels of 9 bits each plus
another user with a black and white channel are some of the possible configurations.
The display system has an arithmetic processor and tv digitizer.\.

\JThe robotics system will be a distributed system which communicates
over an Ethernet [Metcalfe 76]. There will be a few additional direct interfaces.
The Ethernet is a coax cable with about 2 megabit rate and a simple protocol.
This scheme requires only one hardware interface and one software interface for each
new device instead of a hardware and software interface for every pair of
devices.\.

\JDigitized stereo images may be viewed and processed at a stereo station.
The station is comprised of a pair of black and white monitors, 17" Conrac QQA-17C
high resolution video monitors, and a modified Old Delft III scanning
stereoscope. The Grinnell display system provides video to the stereo station monitors.
The cursor can be manipulated by trackball
control through the PDP-11 and PDP-10 computers.\.

\JThere is a computer-controlled animation box which enables the making of
still pictures and motion picture. The device has single frame exposure
and control of color filters, used with multiple exposures from a black
and white monitor.\.

\F0\CSAIL

\JThe system is display-oriented, with full graphics display
terminals in nearly every office.
Display editors and other system software provide unusual
user facilities. High quality printing and graphics devices provide
publication facilities which are supported by software
developed at the AI laboratory (PUB, TEX for
mathematical publication, and POX).
Graphics printing devices include: the XEROX Graphics Printer (XGP),
XEROX Dover Printer
(high resolution bit raster), Varian 22" printer/plotter (bit raster),
Canon laser beam printer, and Alphatype photo composition printer.
Extensive graphics packages exist for vector graphics, gray scale and
overlay, for display and for hard copy.\.

\JThe main computer is a Digital Equipment Corporation KL10 with 11 megabytes
of 36 bit words. It has 8 drives of Ampex 3330-11 discs, 10.5x10↑9 bits.
Peripherals include a DataProducts printer, a Calcomp plotter, two 7 track
tape drives, and 4 Dectape drives.\.

\JThere are 53 displays with 32 channels of Datadisc digital raster memory,
6 III vector displays, 3 Imlac vector displays, 35 Datamedia character
displays, 15 teletype terminals, and 5 TI terminals. The Datadisc displays
are interconnected by a digital switch which allows overlay of channels.
The Datadisc supports graphics and gray scale by overlay of channels.
The primaary graphics and gray scale display device is a Grinnell
raster buffer display with 512x512 pixels with 36 bits per pixel.\.

\JA DEC KA10 serves as peripheral processor. A DEC PDP-11/45 with 80 kwds
of 16 bit memory and with an SPS-41 array processor acts as a real-time
processor for robotics. A TI multiprocessor system also is used for robotics.\.

\CSCORE

\JSCORE is a DECSYSTEM-2060 running
TOPS-20. SCORE includes 1024K words of main memory and 800 million
bytes of online storage. SCORE supports 64 local and dial-access terminals,
and ARPAnet access. It is jointly owned by the departments of
Computer Science,
Electrical Engineering, and Operations Research. All students in a degree
program in Computer Science have access to SCORE. SCORE is operated by
the Computer Science Department.\.

\CSUMEX

\JSUMEX is a dual processor DECsystem-1060 running TENEX. It is a national
facility owned by the National Institutes of Health and managed by a
national governing board. Applications of artificial intelligence to
problems in medicine and biology are the prime research foci of this
facility. Students whose research work involves such applications may be
granted access to SUMEX. The SUMEX facility supports local and dial-access terminals,
ARPAnet, and other national networks. Additional equipment associated with SUMEX include
a DECSYSTEM-2020 and a DEC VAX-11/780. These several systems are also on
the Computer Science Ethernet local network.\.

\CCSL

\JClose collaboration is maintained between the Electrical Engineering department and
the Computer Science department. Research in hardware and software
computer systems is done in the Computer Systems Laboratory (CSL). CSL is
a laboratory of the Electrical Engineering Department populated by faculty
from both Electrical Engineering and Computer Science. CSL operates
a VAX-11/780 system for VLSI system design. Also, CSL has a variety
of small computer systems.\.

\CNetwork

\JA BBN TIP (Honeywell DDP-516) provides connection to
the ARPANET. Systems are interfaced to a local network, the
Ethernet, which connects SAIL, SCORE, many DEC VAX systems, DEC PDP11s,
and a network file server.

The Computer Science Department also operates several Xerox Alto personal
computers, linked together by the Ethernet communications network. Xerox
has provided these Altos, a Dover printer, and a network file system, as
an equipment grant to the Computer Science Department.\.

\CPlans

\JStanford has developed the SUN terminal, a high resolution graphic display,
800x1000, with a Motorola 68000 processor. Extensive software support
based on UNIX and C are being developed.

Computer Science and CSL plan to acquire 35 SUN terminals,
10 VAX 11/750s and 15 LISP machines in 1982. A large file server for the
entire SU network will provide economies of scale in storage.
Smaller file servers for clusters of SUN terminals are planned. Since
cluster file servers are based on large disks, they are more economical
and provide higher performance than small disks for individual SUN machines.\.

\setchapter{\budget}
\setchaptitle{Budget}
\chapheading{BUDGET}
\vspsml
\ctrline{\_{Budget for the period Oct 82 thru Sept 83}}

\vspsml
\halign{#\hfil⊗\hskip .5vu\hfil#\hfil\cr
PROPOSAL TO:⊗Rome Air Development Center\cr
TITLE:⊗Digital Stereo Reconstruction Study\cr
SUBMITTED BY:⊗Artificial Intelligence Laboratory, Stanford University\cr}

\vspsml
\halign{#\hfil⊗#\hfil⊗\hskip 1.0vu\hfil#\cr
A.⊗SALARIES AND WAGES⊗\cr
⊗ 1. Senior Personnel⊗\cr
⊗\hskip .5vuBINFORD $(25\%)$⊗13,613.00\cr
⊗\hskip .5vuBAKER $(50\%)$⊗4,641.00\cr
⊗ 2. 6 Student Research Assistants⊗65,430.00\cr
⊗ 3. Support Personnel⊗\cr
⊗ \hskip .5vuSIROKER $(25\%)$⊗7,716.00\cr
⊗⊗--------------\cr
⊗ Total Salaries and Wages⊗101,400.00\cr
B.⊗STAFF BENEFITS ($20.4\%$ Oct 82-Aug 83)⊗\cr
⊗\hfil($22.1\%$ Sept 83)⊗20,889.00\cr
⊗⊗--------------\cr
C.⊗TOTAL SALARIES, WAGES AND STAFF BENEFITS⊗122,289.00\cr
D.⊗CAPITAL EQUIPMENT⊗35,000.00\cr
E.⊗EXPENDABLE SUPPLIES AND EQUIPMENT⊗3,000.00\cr
F.⊗TRAVEL⊗\cr
⊗ 1. Foreign⊗0.00\cr
⊗ 2. Domestic⊗8,000.00\cr
G.⊗PUBLICATIONS⊗2,600.00\cr
H.⊗OTHER COSTS⊗\cr
⊗ 1. Communication (telephone)⊗1,800.00\cr
⊗ 2. Computer cost⊗24,000.00\cr
⊗ 3. Minor equipment and repair⊗1,800.00\cr
⊗⊗--------------\cr
I.⊗TOTAL COSTS (A thru H)⊗198,489.00\cr
J.⊗INDIRECT COSTS (percentage of A thru H, less D)⊗112,807.00\cr
⊗⊗--------------\cr
⊗ tuition remission⊗0.00\cr
K.⊗TOTAL COSTS⊗311,296.00\cr
}

\newpage
\ctrline{\_{Budget for the period Oct 83 thru Sept 84}}

\vspsml
\halign{#\hfil⊗#\hfil⊗\hskip 1.5vu\hfil#\cr
A.⊗SALARIES AND WAGES⊗\cr
⊗ 1. Senior Personnel⊗\cr
⊗\hskip .5vuBINFORD $(25\%)$⊗14,974.00\cr
⊗\hskip .5vuBAKER $(50\%)$⊗16,105.00\cr
⊗ 2. 6 Student Research Assistants⊗71,973.00\cr
⊗ 3. Support Personnel⊗\cr
⊗ \hskip .5vuSIROKER $(35\%)$⊗8,488.00\cr
⊗⊗--------------\cr
⊗ Total Salaries and Wages⊗111,540.00\cr
B.⊗STAFF BENEFITS ($22.1\%$ Oct 83 - Aug 84)⊗\cr
⊗\hfil($22.4\%$ Sept 84)⊗24,690.00\cr
⊗⊗--------------\cr
C.⊗TOTAL SALARIES, WAGES AND STAFF BENEFITS⊗136,230.00\cr
D.⊗CAPITAL EQUIPMENT⊗35,000.00\cr
E.⊗EXPENDABLE SUPPLIES AND EQUIPMENT⊗3,300.00\cr
F.⊗TRAVEL⊗\cr
⊗ 1. Foreign⊗0.00\cr
⊗ 2. Domestic⊗8,800.00\cr
G.⊗PUBLICATIONS⊗3,000.00\cr
H.⊗OTHER COSTS⊗\cr
⊗ 1. Communication (telephone)⊗1,800.00\cr
⊗ 2. Computer cost⊗26,400.00\cr
⊗ 3. Minor equipment and repair⊗1,980.00\cr
⊗⊗--------------\cr
I.⊗TOTAL COSTS (A thru H)⊗216,510.00\cr
J.⊗INDIRECT COSTS (percentage of A thru H, less D)⊗125,241.00\cr
⊗ tuition remission⊗0.00\cr
⊗⊗--------------\cr
K.⊗TOTAL COSTS⊗341,751.00\cr
}
\newpage